different color
Robustness in Both Domains: CLIP Needs a Robust Text Encoder
Rocamora, Elias Abad, Schlarmann, Christian, Singh, Naman Deep, Wu, Yongtao, Hein, Matthias, Cevher, Volkan
Adversarial input attacks can cause a significant shift of CLIP embeddings. This can affect the downstream robustness of models incorporating CLIP in the pipeline, such as text-to-image generative models or large vision language models. While some efforts have been done towards making the CLIP image encoders robust, the robustness of text encoders remains unexplored. In this work, we cover this gap in the literature. We propose LEAF: an efficient adversarial fine-tuning method for the text domain, with the ability to scale to large CLIP models. Our models significantly improve the zero-shot adversarial accuracy in the text domain, while maintaining the vision performance provided by robust image encoders. When combined with text-to-image diffusion models, we can improve the generation quality under adversarial noise. In multimodal retrieval tasks, LEAF improves the recall under adversarial noise over standard CLIP models. Finally, we show that robust text encoders facilitate better reconstruction of input text from its embedding via direct optimization.
Closed-Form Last Layer Optimization
Galashov, Alexandre, Da Costa, Nathaรซl, Xu, Liyuan, Hennig, Philipp, Gretton, Arthur
Neural networks are typically optimized with variants of stochastic gradient descent. Under a squared loss, however, the optimal solution to the linear last layer weights is known in closed-form. We propose to leverage this during optimization, treating the last layer as a function of the backbone parameters, and optimizing solely for these parameters. We show this is equivalent to alternating between gradient descent steps on the backbone and closed-form updates on the last layer. We adapt the method for the setting of stochastic gradient descent, by trading off the loss on the current batch against the accumulated information from previous batches. Further, we prove that, in the Neural Tangent Kernel regime, convergence of this method to an optimal solution is guaranteed. Finally, we demonstrate the effectiveness of our approach compared with standard SGD on a squared loss in several supervised tasks -- both regression and classification -- including Fourier Neural Operators and Instrumental Variable Regression.
An Empirical Investigation of Domain Generalization with Empirical Risk Minimizers (Appendix)
See table 1 for the results. We next perform regression in the Joint setting (Sec.5.3, main paper) where we fit a regression model across all environments, with 5 features instead of 2 reported in the main We find that it is possible to get an Spearman's We considered a set of 40 metrics overall and report only a small subset of them in the main paper. In table 2 we provide detailed results of all the measures we study. Figure 1 provides details of the canonicalization performed on each of the measures as explained in the main paper. In particular, (Ben-David et al., 2007) prove We also develop measures based on follow-up theoretical work in (Ben-David et al., 2010) on divergence measures using the symmetric difference hypothesis space. Here we summarize a result from (Ben-David et al., 2010), This canonicalization is used to report the results in Sec. 5 H: Z P (Y), we follow the steps in algorithm 1. Algorithm 1 Computing H -divergence measure As explained in the main paper, this divergence measure was proposed in (Ben-David et al., 2010).
Review for NeurIPS paper: Fair Hierarchical Clustering
Additional Feedback: Line 68: Kleindessner et al. designed an algorithm for k-center with different type of fairness requirement. Instead of balancing different colors in each cluster, the goal is to pick centers (proportionally) from different colors. It is basically k-center under partition matroid. Line 69-70: In a(n almost) concurrent work, the fair correlation was also studied by Ahamdi et al. Line 131: Bounded representation: with binary colors, it is the same as balance.
Time Series Analysis of Rankings: A GARCH-Type Approach
Piancastelli, Luiza, Barreto-Souza, Wagner
Ranking data are frequently obtained nowadays but there are still scarce methods for treating these data when temporally observed. The present paper contributes to this topic by proposing and developing novel models for handling time series of ranking data. We introduce a class of time-varying ranking models inspired by the Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) models. More specifically, the temporal dynamics are defined by the conditional distribution of the current ranking given the past rankings, which are assumed to follow a Mallows distribution, which implicitly depends on a distance. Then, autoregressive and feedback components are incorporated into the model through the conditional expectation of the associated distances. Theoretical properties of our ranking GARCH models such as stationarity and ergodicity are established. The estimation of parameters is performed via maximum likelihood estimation when data is fully observed. We develop a Monte Carlo Expectation-Maximisation algorithm to deal with cases involving missing data. Monte Carlo simulation studies are presented to study the performance of the proposed estimators under both non-missing and missing data scenarios. A real data application about the weekly ranking of professional tennis players from 2015 to 2019 is presented under our proposed ranking GARCH models.
Vision Language Models See What You Want but not What You See
Gao, Qingying, Li, Yijiang, Lyu, Haiyun, Sun, Haoran, Luo, Dezhi, Deng, Hokin
Knowing others' intentions and taking others' perspectives are two core components of human intelligence typically considered as instantiations of theory of mind. Infiltrating machines with these abilities is an important step towards building human-level artificial intelligence. We here investigate intentionality understanding and perspective-taking in Vision Language Models and, for the purpose, we have created IntentBench and PerspectBench datasets, which contain over 400 cognitive experiments grounded in real-world scenarios and classic cognitive tasks. Surprisingly, we find that VLMs achieve high performance in intentionality understanding but lower performance in perspective-taking using our two datasets. This challenges the common belief in the cognitive science literature that perspective-taking at the corresponding modality is necessary for intentionality understanding.
27 Prime Day Toy Deals On Stuff Our Kids Love (2024)
If you've ever battled crowds to get a coveted toy you know that early October is the perfect time for parents to start holiday shopping. Amazon knows this, which is why the company is holding a second Prime Day sale event--which ends tonight--featuring some great Prime Day toy deals. You can find all the best Prime Day deals here. But if your kids are like my kids, they're already working on their wish lists. If you say you haven't already started budgeting, you are either lying, financially irresponsible, or your children are much less demanding than mine are (I know, it's my fault).
Policy Adaptation from Foundation Model Feedback
Ge, Yuying, Macaluso, Annabella, Li, Li Erran, Luo, Ping, Wang, Xiaolong
Recent progress on vision-language foundation models have brought significant advancement to building general-purpose robots. By using the pre-trained models to encode the scene and instructions as inputs for decision making, the instruction-conditioned policy can generalize across different objects and tasks. While this is encouraging, the policy still fails in most cases given an unseen task or environment. In this work, we propose Policy Adaptation from Foundation model Feedback (PAFF). When deploying the trained policy to a new task or a new environment, we first let the policy play with randomly generated instructions to record the demonstrations. While the execution could be wrong, we can use the pre-trained foundation models to provide feedback to relabel the demonstrations. This automatically provides new pairs of demonstration-instruction data for policy fine-tuning. We evaluate our method on a broad range of experiments with the focus on generalization on unseen objects, unseen tasks, unseen environments, and sim-to-real transfer. We show PAFF improves baselines by a large margin in all cases. Our project page is available at https://geyuying.github.io/PAFF/